25 research outputs found

    Raster data server prototype: Rasdaman

    Full text link
    In this thesis, we assessed the capability of the Rasdaman software as a full-scale raster data server. In the domain of Geographic Information Systems (GIS), data servers hold an even more prominent position than in traditional information systems. Currently, many solutions, both private and open source, exist for vectorial data and have been extensively tested on practical project. However, solutions for raster data are scarcer. Of the few available solutions, we explored Rasdaman, a rather generic software for multi-dimensional arrays (MDA). As it turns out, Rasdaman has no built-in support for Geographic Coordinate Systems (GCS) and, as such, cannot be used alone, in spite of its MDA language which makes it a strong candidate for web services. Nonetheless, GCS can be build on top of Rasdaman and actually is in the form of a Petascope plugin. All in all, the combination Rasdaman, Petascope and PostGIS make for a powerful and flexible data server for both vector and raster data

    Apprentissage supervisés sous contraintes

    Full text link
    As supervised learning occupies a larger and larger place in our everyday life, it is met with more and more constrained settings. Dealing with those constraints is a key to fostering new progress in the field, expanding ever further the limit of machine learning---a likely necessary step to reach artificial general intelligence. Supervised learning is an inductive paradigm in which time and data are refined into knowledge, in the form of predictive models. Models which can sometimes be, it must be conceded, opaque, memory demanding and energy consuming. Given this setting, a constraint can mean any number of things. Essentially, a constraint is anything that stand in the way of supervised learning, be it the lack of time, of memory, of data, or of understanding. Additionally, the scope of applicability of supervised learning is so vast it can appear daunting. Usefulness can be found in areas including medical analysis and autonomous driving---areas for which strong guarantees are required. All those constraints (time, memory, data, interpretability, reliability) might somewhat conflict with the traditional goal of supervised learning. In such a case, finding a balance between the constraints and the standard objective is problem-dependent, thus requiring generic solutions. Alternatively, concerns might arise after learning, in which case solutions must be developed under sub-optimal conditions, resulting in constraints adding up. An example of such situations is trying to enforce reliability once the data is no longer available. After detailing the background (what is supervised learning and why is it difficult, what algorithms will be used, where does it land in the broader scope of knowledge) in which this thesis integrates itself, we will discuss four different scenarios. The first one is about trying to learn a good decision forest model of a limited size, without learning first a large model and then compressing it. For that, we have developed the Globally Induced Forest (GIF) algorithm, which mixes local and global optimizations to produce accurate predictions under memory constraints in reasonable time. More specifically, the global part allows to sidestep the redundancy inherent in traditional decision forests. It is shown that the proposed method is more than competitive with standard tree-based ensembles under corresponding constraints, and can sometimes even surpass much larger models. The second scenario corresponds to the example given above: trying to enforce reliability without data. More specifically, the focus in on out-of-distribution (OOD) detection: recognizing samples which do not come from the original distribution the model was learned from. Tackling this problem with utter lack of data is challenging. Our investigation focuses on image classification with convolutional neural networks. Indicators which can be computed alongside the prediction with little additional cost are proposed. These indicators prove useful, stable and complementary for OOD detection. We also introduce a surprisingly simple, yet effective summary indicator, shown to perform well across several networks and datasets. It can easily be tuned further as soon as samples become available. Overall, interesting results can be reached in all but the most severe settings, for which it was a priori doubtful to come up with a data-free solution. The third scenario relates to transferring the knowledge of a large model in a smaller one in the absence of data. To do so, we propose to leverage a collection of unlabeled data which are easy to come up with in domains such as image classification. Two schemes are proposed (and then analyzed) to provide optimal transfer. Firstly, we proposed a biasing mechanism in the choice of unlabeled data to use so that the focus is on the more relevant samples. Secondly, we designed a teaching mechanism, applicable for almost all pairs of large and small networks, which allows for a much better knowledge transfer between the networks. Overall, good results are obtainable in decent time provided the collection of data actually contains relevant samples. The fourth scenario tackles the problem of interpretability: what knowledge can be gleaned more or less indirectly from data. We discuss two subproblems. The first one is to showcase that GIFs (cf. supra) can be used to derive intrinsically interpretable models. The second consists in a comparative study between methods and types of models (namely decision forests and neural networks) for the specific purpose of quantifying how much each variable is important in a given problem. After a preliminary study on benchmark datasets, the analysis turns to a concrete biological problem: inferring gene regulatory network from data. An ambivalent conclusion is reached: neural networks can be made to perform better than decision forests at predicting in almost all instances but struggle to identify the relevant variables in some situations. It would seem that better (motivated) methods need to be proposed for neural networks, especially in the face of highly non-linear problems

    Collaborative analysis of multi-gigapixel imaging data using Cytomine

    Get PDF
    Motivation: Collaborative analysis of massive imaging datasets is essential to enable scientific discoveries. Results: We developed Cytomine to foster active and distributed collaboration of multidisciplinary teams for large-scale image-based studies. It uses web development methodologies and machine learning in order to readily organize, explore, share and analyze (semantically and quantitatively) multi-gigapixel imaging data over the internet. We illustrate how it has been used in several biomedical applications

    Intracapsular pressures in the flexion-abduction-external rotation and flexion-adduction-internal rotation tests and their comparison with classic hip range of motion : a cadaveric assessment

    Get PDF
    Background Flexion-Abduction-External-Rotation and Flexion-Adduction-Internal-Rotation tests are used to reproduce pain at the hip during clinical assessment. As pain can be elicited by high intracapsular pressure, no information has been provided regarding intracapsular pressure during these pain provocative tests. Methods Eight hip joints from four cadaveric specimens (78.5 ± 7.9 years) were assessed using intra-osseous tunnels reaching the lateral and acetabular compartments. To simulate synovial liquid, 2.7 ml of liquid were inserted in both compartments using adaptor injectors. Optic pressure transducers were used to measure pressure variations. Pressures were compared between compartments in each test and between tests for each compartment. Both tests were compared with uniplanar movements. Findings The Flexion-Adduction-Internal-Rotation test showed a significant difference between pressure measured in the lateral (27.17 ± 42.63 mmHg) and acetabular compartment (−26.80 ± 29.26 mmHg) (P < 0.006). The pressure measured in the lateral compartment during the Flexion-Adduction-Internal-Rotation test (27.17 ± 42.63 mmHg) was significantly higher than in the Flexion-Abduction-External-Rotation test (−8.09 ± 15.09 mmHg) (P < 0.010). The pressure measured in the lateral compartment in the Flexion-Abduction-External-Rotation test was significantly lower than during internal rotation (P = 0.011) and extension (P = 0.006). Interpretation High intracapsular pressure is correlated with greater pain at the hip. Clinicians should assess pain with caution during the Flexion-Adduction-Internal-Rotation test as this test showed high intracapsular pressures in the lateral compartment. The Flexion-Abduction-External-Rotation is not influenced by high intra-capsular pressures

    High prevalence of epilepsy in onchocerciasis endemic regions in the Democratic Republic of the Congo

    Get PDF
    Background: An increased prevalence of epilepsy has been reported in many onchocerciasis endemic areas. The objective of this study was to determine the prevalence of epilepsy in onchocerciasis endemic areas in the Democratic Republic of the Congo (DRC) and investigate whether a higher annual intake of Ivermectin was associated with a lower prevalence of epilepsy. Methodology/Principle findings: Between July 2014 and February 2016, house-to-house epilepsy prevalence surveys were carried out in areas with a high level of onchocerciasis endemicity: 3 localities in the Bas-Uele, 24 in the Tshopo and 21 in the Ituri province. Ivermectin uptake was recorded for every household member. This database allowed a matched case-control pair subset to be created that enabled putative risk factors for epilepsy to be tested using univariate logistic regression models. Risk factors relating to onchocerciasis were tested using a multivariate random effects model. To identify presence of clusters of epilepsy cases, the Kulldorff's scan statistic was used. Of 12, 408 people examined in the different health areas 407 (3.3%) were found to have a history of epilepsy. A high prevalence of epilepsy was observed in health areas in the 3 provinces: 6.8–8.5% in Bas-Uele, 0.8–7.4% in Tshopo and 3.6–6.2% in Ituri. Median age of epilepsy onset was 9 years, and the modal age 12 years. The case control analysis demonstrated that before the appearance of epilepsy, compared to the same life period in controls, persons with epilepsy were around two times less likely (OR: 0.52; 95%CI: (0.28, 0.98)) to have taken Ivermectin than controls. After the appearance of epilepsy, there was no difference of Ivermectin intake between cases and controls. Only in Ituri, a significant cluster (p-value = 0.0001) was identified located around the Draju sample site area. Conclusions: The prevalence of epilepsy in health areas in onchocerciasis endemic regions in the DRC was 2–10 times higher than in non-onchocerciasis endemic regions in Africa. Our data suggests that Ivermectin protects against epilepsy in an onchocerciasis endemic region. However, a prospective population based intervention study is needed to confirm this

    Un tour d'horizon de l'apprentissage automatique

    Full text link
    Since the dawn of machine learning (ML), it hasn’t stop spreading into our everyday lives in new, creative ways. Why Google, Facebook, Amazon and the like have invested so much in ML recently? What can (and can’t) ML actually do for us? In this non-technical (and non-exhaustive) talk we will examine ML applications ranging from standard uses to some of the most exotic ones

    Classification générique d'images : approches aléatoires et convolutionnelles

    Full text link
    Supervised learning introduces genericity in the field of image classification, thus enabling fast progress in the domain. Genericity does not imply ease-of-use, however, and the best methods in term of accuracy, namely convolutional neural networks, suffer from its lack. In this master thesis, we propose an alternative approach relying on extremely randomized trees and random subwindow extraction combine with elements of the convolutional networks. We explore two modes of utilization of the forest: primarily a direct approach where the forest is the final classifier (ET-DIC) and to a lesser extent, a preprocessing step where the forest is used to build a visual dictionary but where the actual classification is undertaken by a support vector machine (ET-FL). We show that, in both modes, our scheme performs better than without using the convolutional network elements but we are not quite yet reaching their performances. The ET-DIC variant keeps more in the line of classification forest advantages but performs less well as far as accuracy is concerned. This is further highlighted by the remarkable stability of the ET-DIC mode. This stability accounts for the ease-of-use of the method but also prevents elaborated optimization. We were able to score an accuracy of 0.613 whereas the record for this mode without the convolutional network elements was of 0.5367. The ET-FL produces better results at the cost of a greater variability of accuracy due to the loss of the ability to favor the interesting filters and a greater overfitting, consequence of the loss of the ensemble smoothing effect. The accuracies range from 0.55 to 0.7431 depending on the choice of hyper-parameters. The computational cost of both methods is much greater than with a traditional forest, however

    Joint learning and pruning of decision forests

    Full text link
    Decision forests such as Random Forests and Extremely randomized trees are state-of-the-art supervised learning methods. Unfortunately, they tend to consume much memory space. In this work, we propose an alternative algorithm to derive decision forests under heavy memory constraints. We show that under such constraints our method usually outperforms simpler baselines and can even sometimes beat the original forest

    Sample-Free White-Box Out-of-Distribution Detection for Deep Learning

    Full text link
    peer reviewedBeing able to detect irrelevant test examples with respect to deployed deep learning models is paramount to properly and safely using them. In this paper, we address the problem of rejecting such out-of-distribution (OOD) samples in a fully sample-free way, ie, without requiring any access to in-distribution or OOD samples. We propose several indicators which can be computed alongside the prediction with little additional cost, assuming white-box access to the network. These indicators prove useful, stable and complementary for OOD detection on frequently-used architectures. We also introduce a surprisingly simple, yet effective summary OOD indicator. This indicator is shown to perform well across several networks and datasets and can furthermore be easily tuned as soon as samples become available. Lastly, we discuss how to exploit this summary in real-world settings
    corecore